Using corpora tools to analyze gradable nouns in Dutch
نویسندگان
چکیده
In this paper, we expand Morzycki (2009)’s claims that degree readings of size adjectives are attributed to syntax. We introduce a corpus-based analysis in Dutch to verify and extend his claim into the semantic domain. Using the LASSY Treebank, we extract syntactic and semantic properties of noun phrases consisting of the adjectives “gigantisch”, “kolossaal”, and “reusachtig” and manually annotate each adjective-noun pair with a gradable or nongradable label. Using these features, we construct a statistical model based on logistic regression and find that the grammatical role, definiteness, and particular semantic noun groups derived from Cornetto (a Dutch WordNet with referential relations) have a significant effect on the likelihood that an adjective-noun pair is interpreted by the reader to have a degree reading.
منابع مشابه
Deriving de/het gender classification for Dutch nouns for rule-based MT generation tasks
Linguistic resources available in the public domain, such as lemmatisers, part-ofspeech taggers and parsers can be used for the development of MT systems: as separate processing modules or as annotation tools for the training corpus. For SMT this annotation is used for training factored models, and for the rule-based systems linguistically annotated corpus is the basis for creating analysis, ge...
متن کاملThe Other Pole of Degree Modification of Gradable Nouns by Size Adjectives: A Mandarin Chinese Perspective
Size adjectives can have degree readings when they modify gradable nouns. However, a cross-linguistic variation exists with respect to what type(s) of size adjectives in a particular language can have such readings. In English degree readings are available only for size adjectives that predicate bigness, and in Mandarin Chinese degree readings are available for all size adjectives irrespective ...
متن کاملCrosslingual Countability Classification: English meets Dutch
This paper presents a range of methods for classifying Dutch nouns as countable, uncountable or plural only based on both Dutch and English data. The classification is based on the occurrence of countability specific linguistic features that are extracted from unannotated corpora. We show that in the absence of reliable Dutch gold standard data, cross-linguistic classification can be achieved o...
متن کاملSemantic Clustering in Dutch Automatically inducing semantic classes from large-scale corpora
Handcrafting semantic classes is a difficult and time-consuming job, and depends on human interpretation. Possible machine learning techniques would be much faster, and do not rely on interpretation, because they stick to the data. The goal of this research is to present some machine learning techniques that make it possible to achieve an automatic clustering of Dutch words. More particularly, ...
متن کاملSemantics-based Multiword Expression Extraction
This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributio...
متن کامل